Skip to main content

Logistic regression analysis

Logistic regression analyses (logit/probit) are used to estimate the effect a set of explanatory variables has on the probability of a given outcome given by a dichotomous response variable (job/non-job, action/no action etc). Through options, you can adapt the output (do not show the fixed term, change the significance level, etc.).

The example below demonstrates a logit analysis. Alternatively, probit can also be used. Multinomial analyses (more than 2 outcomes) can also be done using the command mlogit.

 //Connect to database
require no.ssb.fdb:23 as db

//Start by importing relevant variables
create-dataset demographydata
import db/BEFOLKNING_KJOENN as gender
import db/BEFOLKNING_FOEDSELS_AAR_MND as birth_year_month
import db/BEFOLKNING_STATUSKODE 2020-01-01 as regstat
import db/SIVSTANDFDT_SIVSTAND 2020-01-01 as civstat
import db/INNTEKT_BRUTTOFORM 2020-01-01 as wealth
import db/INNTEKT_WYRKINNT 2021-01-01 as work_income21

//Limit the population
generate age = 2020 - int(birth_year_month / 100)
keep if regstat == '1' & age > 15 & age < 67

//Generate a dependent variable with two outcomes (dummy variable): High work_income vs. low work_income
generate high_income = 0
replace high_income = 1 if work_income21 > 800000

//Adapt the independent variables so that they suit the statistical model (most variables needs to be transformed into dummy variables)
generate male = 0
replace male = 1 if gender == '1'

generate married = 0
replace married = 1 if civstat == '2'

generate wealth_high = 0
replace wealth_high = 1 if wealth > 1500000

//Run the logit analysis where the dependent variable (dummy) is allways listed first
logit high_income male married age wealth_high